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I. INTRODUCTION 

Unraveling the mechanism for electroweak symmetry 
breaking and the generation of mass of elementary parti- 
cles has been a priority in experimental particle physics 
research during the last decades. In the standard model 
(SM) [l| this is accomplished by introducing a SU(2) 
doublet of self-interacting elementary scalars, the "Higgs 
field" , whose non-zero vacuum expectation value breaks 
the electroweak symmetry and generates the mass of the 
W and Z bosons [2j . The postulated Yukawa interactions 
between the fermions and the Higgs field also gives mass 
to fermions upon the breaking of the electroweak sym- 
metry. Furthermore, a physical scalar particle appears 
in the spectrum, the Higgs boson (H) , whose mass is not 
predicted and must be determined experimentally. 

Within the SM, indirect constraints from precision 
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electroweak observables [3] limit the allowed range for the 
Higgs boson mass (M H ) to M H < 152 GeV at the 95% 
confidence level (CL). Direct searches at the CERN e + e~ 
Collider (LEP) Q set a lower limit of M H > 114 .4 GeV 
at 95% CL. At hadron colliders the dominant production 
mechanisms for a SM Higgs boson are gluon fusion (GF) 
(gg — > H), associated production with a W or Z boson 
(qq' — > VH, V = W,Z), and vector boson fusion (VBF) 
(VV — > H). However, the search strategies for a light 
SM Higgs boson are different at the Fermilab Tevatron 
pp Collider and at CERN's Large Hadron pp Collider 
(LHC). 

At the Tevatron, the most sensitive SM Higgs boson 
searches for Mh < 130 GeV rely on the VH production 
mode, with H — > bb, while for Mjj > 130 GeV the main 
search mode is gg — > H — > W + W~ . The combination 
of searches at the Tevatron Q have resulted in the mass 
ranges 100 < M H < 103 GeV and 147 < M H < 180 GeV 
being excluded at the 95% CL. In the allowed inter- 
mediate mass range an excess is found with a maxi- 
mum local significance of 3.1 standard deviations (s.d.) 
at Mh — 125 GeV, primarily originating from the VH 
(H — > bb) searches p. 

At the LHC, the search strategy for M H > 140 GeV 
also capitalizes on the GF production mode, exploiting 
primarily the H — > W + W~ and H — > ZZ decay modes 
with leptonic W and Z boson decays. The H — > 77 decay 
mode becomes one of the most promising discovery chan- 
nels at lower Mh, despite its small branching fraction 
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of B(H — > 77) 0.2%, owing to its clean experimental 
signature of a narrow resonance on top of a smoothly- 
falling background in the diphoton mass (M 77 ) spectrum. 
Searches for H -> ZZ& -> £+£-£'+£'- (£,£' = e,(i) are 
also sensitive due to the small background and excellent 
four-lepton invariant mass resolution. The most recent 
searches for the SM Higgs boson at the LHC 0, S] ex- 
clude a SM Higgs boson with Mh < 600 GeV, except 
for the narrow mass range s» 122 — 127 GeV. In this 
mass range both the ATLAS and the CMS Collabora- 
tions observe a significant excess of events in data at 
Mh ~ 125 GeV with local significances of 5.9 and 5.0 
s.d., respectively. These excesses are formed by smaller 
excesses observed in searches focused on H — > 77 and 
H -t ZZW decays, while no significant excesses have 
been found in searches targeting fermionic decay modes 
(H — » bb and H — > t + t~) with the datasets analyzed so 
far. 

Searches for H — !• 77 are particularly sensitive to 
new particles beyond the SM contributing to the loop- 
mediated Hgg and/or i?77 vertices, and to deviations 
in the couplings between the SM particles and the Higgs 
boson from those predicted by the SM. For example, al- 
ternative models of electroweak symmetry breaking Q 
can involve suppressed couplings of the Higgs boson to 
fermions, with the extreme case being the fermiopho- 
bic Higgs boson (Hi) scenario, in which Hf has no tree- 
level couplings to fermions but has SM coupling to weak 
gauge bosons. In this scenario the GF production mech- 
anism is absent, decays into fermions are heavily sup- 
pressed, and B(H — > 77) is significantly enhanced. The 
best-fit cross sections to the signal-like excesses in the 
H — » 77 searches at the LHC show small deviations of 
about 1.5 s.d. above the SM prediction [j], [jf. A more 
detailed global fit to Higgs boson couplings [10| shows 
no significant deviations. Hence, the analysis of more 
data is needed for more definitive conclusions. Searches 
for a fermiophobic Higgs boson were performed by the 
LEP Collaborations pj|, the CDF [H| and DO G3 Col- 
laborations and, most recently, by the ATLAS [14| and 
CMS [15] Collaborations. The most restrictive limits re- 
sult from the combination of H — > 77, H — > W + W~ and 
H — > ZZ searches by the CMS Collaboration, excluding 
the mass range 110 < M Hi < 194 GeV. 

In this Article, we present the result from the search 
for a Higgs boson decaying into 77 using the complete 
dataset collected with the DO detector in pp collisions 
at y/s — 1.96 TeV during Run II of the Tevatron Col- 
lider. This search employs multivariate techniques to 
improve the signal-to-background discrimination, and is 
separately optimized for a SM Higgs boson and for a 
fermiophobic Higgs boson. Compared to the previous DO 
publication [13|, the sensitivity for the SM Higgs boson 
is improved by about 40%, resulting in the most restric- 
tive limits to date from the Tevatron in this decay mode. 
The search for a fermiophobic Higgs boson has compara- 
ble sensitivity with the most recent result from the CDF 
Collaboration This result constitutes an important 



input for the upcoming publications on combinations of 
Higgs boson searches by the DO experiment, as well as 
by both Tevatron experiments, using the complete Run 
II dataset. 



II. DO DETECTOR AND DATA SET 

The DO detector is described in detail elsewhere 
The subdetectors most relevant to this analysis are the 
central tracking system, composed of a silicon microstrip 
tracker (SMT) and a central fiber tracker (CFT) in a 2 T 
solenoidal magnetic field, the central preshower (CPS), 
and the liquid-argon and uranium sampling calorimeter. 

The SMT has about 800,000 individual strips, with 
typical pitch of 50-80 /mi, and a design optimized for 
tracking and vertexing capability at pseudorapidities of 
|ry| < 2.5 The system has a six-barrel longitudinal 

structure, each with a set of four layers arranged axially 
around the beam pipe, and interspersed with 16 radial 
disks. In the summer of 2006 an additional layer of silicon 
sensors was inserted at a radial distance of ~ 16 mm from 
the beam axis, and the two outermost radial disks were 
removed. The CFT has eight thin coaxial barrels, each 
supporting two doublets of overlapping scintillating fibers 
of 0.835 mm diameter, one doublet being parallel to the 
collision axis, and the other alternating by ±3° relative 
to the axis. Light signals are transferred via clear fibers 
to visible light photon counters (VLPC) that have about 
80% quantum efficiency. 

The CPS is located just outside of the superconducting 
magnet coil (in front of the calorimetry) and is formed 
by one radiation length of absorber followed by several 
layers of extruded triangular scintillator strips that are 
read out using wavelength-shifting fibers and VLPCs. 

The calorimeter consists of three sections housed in 
separate cryostats: a central calorimeter covering up to 
1 77 1 ps 1.1, and two end calorimeters extending the cov- 
erage up to \rj\ ~ 4.2. Each section is divided into elec- 
tromagnetic (EM) layers on the inside and hadronic lay- 
ers on the outside. The EM part of the calorimeter is 
segmented into four longitudinal layers with transverse 
segmentation of A?y x A<p = 0.1 x 0.1 [l7j . except in the 
third layer (EM3), where it is 0.05 x 0.05. The calorime- 
ter is well suited for a precise measurement of electron 
and photon energies, providing a resolution of rs 3.6% at 
energies of ps 50 GeV. 

Luminosity is measured using plastic scintillator arrays 
located in front of the end calorimeter cryostats, cover- 
ing 2.7 < |ry| < 4.4. Trigger and data acquisition systems 
are designed to accommodate the high luminosities of 
Run II. Based on preliminary information from tracking, 
calorimetry, and muon systems, the output of the first 
level of the trigger is used to limit the rate for accepted 
events to about 2 kHz. At the next trigger stage, with 
more refined information, the rate is reduced further to 
about 1 kHz. These first two levels of triggering rely 
mainly on hardware and firmware. The third and final 
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level of the trigger, with access to all the event infor- 
mation, uses software algorithms and a computing farm, 
and reduces the output rate to about 100 Hz, which is 
written to tape. 

This analysis uses the complete dataset of pp collisions 
at y/s = 1.96 TeV recorded with the DO detector during 
Run II of the Tevatron Collider. The data are acquired 
using triggers requiring at least two clusters of energy 
in the EM calorimeter with loose shower shape require- 
ments and varying transverse momentum (pr) thresh- 
olds between 15 GeV and 25 GeV. The trigger efficiency 
is close to 100% for final states containing two photon 
candidates with pt > 25 GeV. Only events for which 
all subdetector systems are fully operational are consid- 
ered. The analyzed dataset corresponds to an integrated 
luminosity of 9.6 fb" 1 [H]. 



hadronization, with a subsequent correction to the pr 
spectrum of the Z boson to match measurements in 
data [3(|. The Z/j* — > e + e~ MC sample is normalized 
to the NNLO theoretical cross section [3l|. 

All MC samples are processed through a GEANT-based 
}32j simulation of the DO detector. To accurately model 
the effects of multiple pp interactions and detector noise, 
data events from random pp crossings that have an in- 
stantaneous luminosity spectrum similar to the events in 
this analysis are overlaid on the MC events. These MC 
events are then processed using the same reconstruction 
algorithms as used on the data. Simulated events are cor- 
rected so that the physics object identification efficien- 
cies, energy scales and energy resolutions match those 
determined in data control samples. 



III. EVENT SIMULATION 



IV. OBJECT IDENTIFICATION AND EVENT 
SELECTION 



Monte Carlo (MC) samples of Higgs boson signal are 
generated separately for the GF, VH and VBF processes 
using the PYTHIA [19| leading-order (LO) event gener- 
ator with the CTEQ6L1 [2(| parton distribution func- 
tions (PDFs). Signal samples are generated for 100 < 
Mjj < 150 GeV, in increments of 5 GeV. Signal samples 
are normalized using the next-to-next-to-leading order 
(NNLO) plus next-to-next-to-leading-logarithm (NNLL) 
cross sections for GF [U and NNLO for VH and VBF 
processes (HHH, computed with the MSTW 2008 PDF 
set [13]. The Higgs boson's branching fraction predic- 
tions are from hdecay [25]. To improve the signal mod- 
eling for the GF process, the pt of the Higgs boson is 
corrected to match the prediction at NNLO+NNLL accu- 
racy by the hqt program [26J . In the case of the fermio- 
phobic model, where the GF process is absent, the VH 
and VBF cross sections are normalized to the SM pre- 
diction, while the modified H — > 77 branching fractions 
are computed with HDECAY. 

The main background affecting this search is direct 
photon pair (DPP) production, where two isolated pho- 
tons with high transverse momenta are produced. The 
rest of the backgrounds are of instrumental origin and in- 
clude 7+jet (7,7) and dijet (jj) production, where at least 
one jet is misidentified as a photon. A smaller instru- 
mental background originates from Z/7* — » e + e~ pro- 
duction, where both electrons are misidentified as pho- 
tons. The normalization and shape of the jj and jj 
backgrounds, as well as the overall normalization of the 
DPP background, are estimated from data, as discussed 
in Sect. El The shape of the DPP background is modeled 
via a MC sample generated using SHERPA [27j with the 
CTEQ6L1 PDF set. Recent measurements of DPP dif- 
ferential cross sections [28| have shown that SHERPA pro- 
vides an adequate model of this process in the kinematic 
region of interest for this search. The Z/j* — ¥ e + e~ pro- 
cess is modeled using ALPGEN [2^ with the CTEQ6L1 
PDF set, interfaced to pythia for parton showering and 



A. Photon reconstruction and energy scale 

Photon candidates are formed from clusters of 
calorimeter cells within a cone of radius TZ = 
^{Arj) 2 + (A(j)) 2 = 0.4 around a seed tower The fi- 
nal cluster energy is then recalculated from the inner core 
with TZ — 0.2. The photon candidates are selected by re- 
quiring: (i) at least 95% of the cluster energy is deposited 
in the EM calorimeter layers, (ii) the calorimeter isola- 
tion X = [£ t ot(0.4) -£ EM (0.2)]/.E EM (0.2) < 0.1, where 
-Etot(0.4) is the total energy in a cone of radius TZ = 0.4 
and _E EM (0.2) is the EM energy in a cone of radius 
TZ = 0.2, (iii) the scalar sum of the pr of all tracks (p™ trk) 
originating from the hard-scatter pp collision vertex (see 
Sect. IIVB|) in an annulus of 0.05 < TZ < 0.4 around 
the EM cluster is less than 2 GeV, and (iv) the energy- 
weighted EM shower width is required to be consistent 
with that expected for an electromagnetic shower. This 
analysis only considers photon candidates with pseudo- 
rapidity |t7 7 | < 1.1. 

To suppress electrons misidentified as photons, the EM 
clusters are required not to be spatially matched to sig- 
nificant tracker activity, either a track, or a pattern of 
hits in the SMT and CFT consistent with that of an 
electron or positron trajectory [33| . In the following, this 
requirement will be referred to as a "track-match" veto. 

To suppress jets misidentified as photons, an artificial 
neural network (NN) discriminant, which exploits dif- 
ferences in tracker activity and energy deposits in the 
calorimeter and CPS between photons and jets, is de- 
fined 34 1 . The photon NN is trained using diphoton 



and dijet MC samples generated using PYTHIA, using 
the following discriminating variables: P™^, the nu ni- 
bers of cells above a certain threshold requirement in the 
first EM calorimeter layer within TZ < 0.2 and within 
0.2 < TZ < 0.4 of the EM cluster, the number of asso- 
ciated CPS clusters within TZ < 0.1 of the EM cluster, 
and a measure of the width of the energy deposition in 
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FIG. 1: Comparison of the normalized Onn spectra for pho- 
tons from DPP MC simulations and Z — > £ 4 '£~^y data events 
(points with statistical error bars), and for misidentified jets 
from simulated dijet events. 



the CPS. The performance of the photon NN is verified 
using a data event sample consisting of photons radiated 
from charged leptons in Z boson decays (Z — > £ + £~-f, 
£ = e,fi) [33]. Figure [T] compares the NN output (Onn) 
distributions of photons and jets. The shape of the Onn 
distribution for photons is found to be in good agreement 
between data and the MC simulation and is significantly 
different from the shape for misidentified jets. The latter 
is validated using a sample enriched in jets misidentified 
as photons as discussed in Sect. [Vj Photon candidates 
are required to have a Onn value larger than 0.1, which 
is close to 100% efficient for photons while rejecting ap- 
proximately 40% of the remaining misidentified jets. 

The measured photon energies are calibrated using a 
two-step correction procedure. In the first step, the en- 
ergy response of the calorimeter to photons is calibrated 
using electrons from Z boson decays. The resulting cor- 
rections are then applied to all electromagnetic clusters. 
Since electrons and photons shower differently, with elec- 
trons suffering from a larger energy loss in material up- 
stream of the calorimeter, the application of this first set 
of corrections results in an overestimate of the photon en- 
ergy which depends on rf . In the second step, additional 
corrections are derived for photons reconstructed in the 
central calorimeter using a detailed GEANT-based simu- 
lation of the DO detector response. These corrections are 
derived as a function of photon transverse momentum 



(p^) in seven intervals of rp 



0.6 < \if\ < 0.7, 0.7 < 
0.9 < \^\ < 1.0, and 1.0 < 



| < 0.8, 
rf\ < 1.1 



< 0.4, 0.4 < |7jT| < 0.6, 
0.8 < \if\ < 0.9, 
and separately for 
photons with and without a matched CPS cluster. The 
per-photon probability to have a matched CPS cluster is 
measured using photons radiated from charged leptons 
in Z boson decays (Z — > £ + £~j, I = e,/z) and is ss 73%. 
The finer binning at higher r\ is motivated by the strong 



dependence of the energy-loss corrections for electrons on 
r\. The resulting corrections for photons with (without) 
a matched CPS cluster are largest at low p^ w 20 GeV 



and range from about 
to about -6% (-10%) 



■1.5% in the \rp\ < 0.4 interval, 



the 



> 1.0 interval. 



B. Primary vertex reconstruction 

At the Tevatron the distribution of pp collision vertices 
has a Gaussian width of about 25 cm. The proper recon- 
struction of the event kinematics, in particular p7^ and 
thus M 77 , requires the reconstruction and then correct 
selection of the hard-scatter pp collision primary vertex 
(PV) among the various candidate PVs originating from 
additional pp interactions. 

The algorithm used for PV reconstruction is described 
in detail elsewhere |3g|. In a first step, tracks with two 
or more associated SMT hits and pt > 0.5 GeV are clus- 
tered along the z direction. This is followed by a Kalman 
Filter fit [37| to a common vertex of the tracks in each of 
the different vertices. Events are required to have at least 
one reconstructed PV with a z coordinate (zpv) within 
60 cm from the center of detector, a requirement that is 
« 98% efficient. 

The selection of the hard-scatter PV from the list of 
PV candidates with |zpv| < 60 cm is based on an algo- 
rithm exploiting both the track multiplicity of the differ- 
ent vertices and the transverse and longitudinal energy 
distributions in the EM calorimeter and the CPS. These 
energy distributions allow the estimation of the photon 
direction and thus the z coordinate of its production ver- 
tex along the beam direction. When one or both photons 
reconstructed in the EM calorimeter also deposit part of 
their energy in the CPS, the algorithm chooses the PV 
whose zpy is closest to the extrapolation of the photon 
trajectory determined from the calorimeter and the CPS 
information (38j . provided the distance between the co- 
ordinates of the vertex and of the photon trajectory is 
smaller than 3 s.d. The uncertainty on this distance is 
dominated by the uncertainty on the extrapolation of the 
photon direction, which ranges from « 2.5 cm for photons 
with |t7 7 | < 0.4 to 4.3 cm for photons with \rp\ > 0.8. 
Otherwise, the algorithm chooses the PV with the largest 
multiplicity of associated tracks. 

This algorithm is optimized using Z/^* — > e + e~ data 
events, where the correct hard-scatter PV associated with 
the reconstructed tracks is treated as corresponding to a 
diphoton event by ignoring the track information from 
the e + e~ pair, and added to the list of PV candidates 
to which the selection algorithm will be applied. The 
fraction of Z/^* — > e + e~ events for which the selected 
PV agrees with the known hard-scatter PV is shown in 
Fig. [5] as a function of diphoton transverse momentum 
(p^ 7 ) for two different hard-scatter PV selection algo- 
rithms. For an algorithm selecting the hard-scatter PV 
as the one with the highest track multiplicity, the average 
selection probability is only sa 65% and shows a signifi- 



7 



— 1 



0.8 




0.6: 



D0, 9.6 fb" 1 



« 0.4 



0.2 



Data (max n PV) 

x trk ' 

MC (max n PV) 

v trk ' 



°o L 



Data (max n^+y pointing PV) 
MC (max n ^+ y pointing PV) 

10 20 30 40 50 60 70 80 90 100 



P7 (GeV) 



D0 Simulation 
+ H- 
-Fit 



H^yy(M =125 GeV) 

H 




130 140 150 

M YY (GeV) 



FIG. 2: Probability to select the correct hard-scatter PV as a 
function of as measured in Z/-y* — > e + e~ events excluding 
the electron and positron tracks from consideration. The two 
different algorithms discussed in the text are compared. 



FIG. 3: Distribution of the reconstructed diphoton invariant 
mass distribution corresponding to a Higgs boson signal with 
Mh = 125 GeV. The line shows the result of a fit to the 
distribution using the functional form described in Sect. lIVDl 



cant dependence on p^f . The improved algorithm used 
in this analysis, including also photon pointing informa- 
tion, achieves an average selection probability of ~ 95%, 
almost constant as a function of p^ 1 . 



C. Event selection 

At least two photon candidates satisfying the require- 
ments listed in Sect. II V Al and having p^ > 25 GeV and 
|?7 7 | < 1.1 arc required. If more than two photon candi- 
dates are identified, only the two photon candidates with 
highest are considered. At least one of the photon can- 
didates in each event is required to have a matched CPS 
cluster. The photon kinematic variables are computed 
with respect to the vertex selected using the algorithm 
described in Sect. II V 5] A requirement of M 77 > 60 GeV 
is made to ensure a trigger efficiency close to 100%. 

The acceptance of the kinematic requirements is ~ 
42%, as estimated by applying the p^ and rp require- 
ments to generated photons in a gg — > H — > 77 MC sam- 
ple assuming Mh — 125 GeV. At the same assumed Mh, 
the overall event selection efficiency, taking into account 
acceptance and reconstruction, identification and selec- 
tion efficiencies, is ss 22%, almost independent on the 
signal production mechanism. 

To improve the sensitivity to signal, events are cate- 
gorized into two statistically independent samples with 
different signal-to-background ratios. Events where 
both photon candidates satisfy Onn > 0.75 ("photon- 
enriched" sample) and events where at least one pho- 
ton candidate satisfies 0.1 < Onn < 0.75 ("jet-enriched" 
sample) are analyzed separately. The corresponding sam- 
ple compositions are discussed in Sect. |Vl 



D. Invariant mass reconstruction 

After the selection of the pp collision vertex and the 
photon energy scale corrections, the M 77 distribution 
for a Higgs boson signal follows a Gaussian distribution 
peaking at the generated Higgs boson mass, with small 
non-Gaussian tails. This distribution can be modeled 
by the sum of a Crystal Ball function [3!|, describing a 
narrow Gaussian core and a power-law tail toward lower 
masses, and a wider Gaussian distribution, describing 
tails from misvertexing or imperfect photon energy scale 
corrections. Figure [3] shows such a fit to the inclusive 
M 77 spectrum for signal MC with M H = 125 GeV. The 
resolution of the Gaussian core is found to be ~ 3.1 GeV, 
and varies by ±13% when varying Mh by ±25 GeV. 



V. BACKGROUND MODELING AND SAMPLE 
COMPOSITION 

The normalization and shape of the Z/j* — > e + e~ 
background are estimated using simulation. Electrons 
are misidentified as photons at a rate of about 2% due 
to track reconstruction inefficiencies. Such tracking in- 
efficiency is measured in data using a "tag-and-probe" 
method, where Z — > e + e~ events are selected with one 
of the electrons ( "tag" ) passing all identification criteria, 
including matching of the track to the calorimeter clus- 
ter, while only calorimeter requirements are applied to 
the other electron ("probe"). The electron misidentifica- 
tion rate is computed as the fraction of events where the 
probe electron satisfies the "track-match" veto require- 
ment defined in Sect. IIV Al The misidentification rate 
measured in data in this way is applied to the simulated 
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Z/-f* — » e + e~ sample. 

The 7 .7 an d jj yields are estimated using a data-driven 
method [40| ("matrix method"). For selected events, the 
two photons are separated into two types: those with 
Onn > 0.75 (well-identified photon, "p") and those with 
0.1 < Onn < 0.75 (likely fake photon, "f"). Events are 
then classified in four categories: (i) two type-p photons, 
(ii) the higher (leading) photon is type p and the lower 
(trailing) photon is type f, (iii) the leading photon is 
type f and the trailing photon is type p, and (iv) two type- 
f photons. The corresponding numbers of events, after 
subtracting the Z/7* — > e + e~ contribution, are denoted 
as N pp , N p i, Nf p and Ng. The different efficiencies of the 
Onn > 0.75 requirement for photons (e 7 ) and jets (ej) 
are used to estimate the sample composition by solving 
a system of linear equations: 



(iV 77 ,iV 7j 



1 1 Njj ) 



(Npp,Np f ,N f p,N s )x£~\ (1) 



where A 77 (Njj) is the number of 77 (jj) events and 
Njj (Njy) is the number of jj events with the leading 
(trailing) cluster as the photon. The 4x4 matrix £ is con- 
structed with the efficiency terms e 7 and e,-, parameter- 
ized as a function of |r? 7 | for each photon candidate as de- 
termined from photon and jet MC samples, respectively. 
The e 7 and Cj efficiencies averaged over |ry 7 | are « 76% 
and ss 35%, respectively. The efficiency e 7 is validated 
with a data sample of photons radiated from charged lep- 
tons in Z boson decays (Z —> £ + £~j, £ = e, /z). The effi- 
ciency ej is validated using two independent control data 
samples enriched in jets misidentified as photons, cither 
by inverting the photon isolation variable (T > 0.1), or 
by requiring at least one track in a cone of 1Z < 0.05 
around the photon (4lj . In the following, the sum of 7 j 
and jj contributions will be denoted as jj for simplic- 
ity. The shapes of kinematic distributions for "fj (jj) 
background are obtained from independent control sam- 
ples by requiring one (two) photon candidate(s) to sat- 
isfy Onn < 0.1. The Onn < 0.1 requirement leads to 
a mis- modeling of the rf spectrum, due to the \rp\ de- 
pendence of Cj . This is corrected by assigning a weight 
factor defined as e :) (|77 7 |)/(l-e J (|r7 7 |)) for each of the pho- 
ton candidates with Onn < 0.1. 

As discussed in Sect. IIII1 the kinematics of the DPP 
background are predicted using SHERPA. Since the es- 
timated -/V 77 from solving Eq. [TJ could include a con- 
tribution from signal events, it is only used as a prior 
normalization for the DPP background to compare be- 
tween data and background prediction. The normaliza- 
tion of the DPP background is ultimately determined 
from an unconstrained fit to the final discriminants used 
for hypothesis testing in both the photon-enriched and 
jet-enriched samples. For each of these samples, two 
distributions are considered: a multivariate discriminant 
(see Sect. EH constructed to maximize the separation be- 
tween signal and background for events with M 77 falling 
in the interval Mjj ± 30 GeV ("search region"), and the 
M 77 spectrum for events outside this interval ("side- 
band region" ) that provide a high-statistics background- 
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FIG. 4: (color online). Distribution of A/ 77 in (a) the photon- 
enriched sample and (b) the jet-enriched sample. The data 
(points with statistical error bars) are compared to the back- 
ground prediction, broken down into its individual compo- 
nents. The expected distributions for a SM Higgs boson and 
a fermiophobic Higgs boson with Mh = 125 GeV are also 
shown scaled by a factor of 100. 



dominated sample. A comparison between data and the 
background prediction for the M 77 spectrum, separately 
in the photon-enriched and the jet-enriched samples, is 
shown in Fig. 2] 

Tables HI and ITT1 summarize the number of data events, 
expected backgrounds, and expected SM and fermio- 
phobic Higgs boson signals, resulting from the fit for 
five hypothesized Higgs boson masses, for the photon- 
enriched and jet-enriched samples, respectively. For 
Mh = 125 GeV, the estimated background composition 
for the photon-enriched sample in the M 77 interval of 
[95 GeV, 155 GeV] is about 80% (DPP), 14% ( 7 j), 3% 
(jj) and 3% (Z/j* — > e + e~). The corresponding com- 
position for the jet-enriched sample is about 48% (DPP), 
31% ( 7 j), 18% (jj) and 3% (Z/ 7 * -> e+e"). 
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M H (GeV) 


105 


115 


125 


135 


145 


77 (DPP) 


2777 ± 65 


1928 ± 44 


1355 ± 31 


980 ± 22 


721 ± 17 


13 


704 ± 40 


407 ± 24 


238 ± 14 


144 ± 9 


88 ± 6 


33 


183 ± 16 


93 ± 9 


54 ± 6 


34 ± 4 


19 ± 2 


z//^y — > e e 




1 AO _|_ on 


CI _|_ 11 

01 zt 11 


99 -1- ^ 
ZZ zt 


11 _L Q 
11 Zt O 


Total background 


3883 ± 61 


2577 ± 45 


1698 ± 30 


1180 ± 21 


839 ± 16 


Data 


3777 


2475 


1664 


1147 


813 


H signal 


3.6 ± 0.4 


3.5 ± 0.4 


3.0 ± 0.4 


2.2 ± 0.3 


1.4 ± 0.2 


H{ signal 


49.8 ± 1.1 


14.0 ± 0.3 


4.8 ± 0.1 


1.9 ± 0.1 


0.79 ± 0.03 



TABLE I: Signal, backgrounds and data yields for the photon-enriched sample within the Mh ± 30 GeV mass window, for 
Mh = 105 GeV to Mh = 145 GeV in 10 GeV intervals. The background yields are from a fit to the data. The uncertainties 
include both statistical and systematic contributions added in quadrature and take into account correlations among processes. 
The uncertainty on the total background is smaller than the sum in quadrature of the uncertainties in the individual background 
sources due to the anti-correlation resulting from the fit. 



M H (GeV) 


105 


115 


125 


135 


145 


77 (DPP) 


1969 ± 47 


1406 ± 33 


1012 ± 24 


734 ± 17 


545 ± 13 


13 


1852 ± 100 


1101 ± 60 


653 ± 36 


391 ± 22 


251 ± 15 


33 


1188 ± 94 


647 ± 54 


365 ± 31 


219 ± 19 


135 ± 12 


Z/j* -> e+e" 


227 ± 39 


152 ± 28 


61 ± 11 


30 ± 7 


20 ± 5 


Total background 


5236 ± 67 


3307 ± 45 


2091 ± 29 


1374 ± 21 


951 ± 17 


Data 


5287 


3384 


2156 


1422 


989 


H signal 


2.7 ± 0.3 


2.6 ± 0.3 


2.2 ± 0.3 


1.7 ± 0.2 


1.1 ± 0.1 


Hf signal 


34.8 ± 0.8 


9.8 ± 0.3 


3.4 ± 0.1 


1.34 ± 0.04 


0.56 ± 0.02 



TABLE II: Signal, backgrounds and data yields for the jet-enriched sample within the Mh ± 30 GeV mass window, for Mh = 
105 GeV to Mh = 145 GeV in 10 GeV intervals. The background yields are from a fit to the data. The uncertainties include 
both statistical and systematic contributions added in quadrature and take into account correlations among processes. The 
uncertainty on the total background is smaller than the sum in quadrature of the uncertainties in the individual background 
sources due to the anti-correlation resulting from the fit. 



VI. SIGNAL-TO-BACKGROUND 
DISCRIMINATION 

The diphoton mass M 77 is the most effective discrim- 
inating variable between the Higgs boson signal and the 
background. However, further discrimination can be 
achieved by exploiting additional kinematic variables as 
well as photon quality variables. A total of ten well- 
modeled discriminating variables are considered in this 
search. Two of these variables correspond to kinematic 
properties of the photons: leading photon transverse mo- 
mentum (p^ 1 ) an d trailing photon transverse momentum 
(Pt) whichj as illustrated in Fig.[5j follow a harder spec- 
trum in signal than in background, as expected for the 
decay of a heavy resonance. Three of the variables are 
related to the kinematics of the diphoton system: Af 77 , 
pZ 7 and azimuthal angle separation between the photons 
(A0 77 ). The two latter variables give discrimination due 
to the large pt of the Higgs boson in VH and VBF pro- 
duction. Therefore, as illustrated in Fig.[l)J p7^ and A</> 77 
are particularly sensitive variables in the search for a 
fermiophobic Higgs boson. 

The scalar nature of the Higgs boson affects the an- 
gular distributions of the photons in the diphoton rest 
frame. To minimize uncertainties from the transverse 
momentum of the colliding partons, the Collins-Soper 



frame |42[ is used. In this frame, the z axis is defined 
as the bisector of the proton beam momentum and the 
negative of the antiproton beam momentum when they 
are boosted into the center-of-mass frame of the diphoton 
pair. The variable 8* is defined as the angle between the 
leading photon momentum and the z axis. The variable 
(j>* is defined as the angle between the diphoton plane 
and the pp plane. Due to the restriction to photons with 
|?7 7 | < 1.1 in this analysis, the cos6** distribution has 
little discrimination between signal and background, al- 
though it is considered in the search. In contrast, the 
angle <jf provides useful discrimination between signal 
and background, particularly for a fermiophobic Higgs 
boson, as illustrated in Fig. [TJa). 

A significant fraction of W and Z boson decays in VH 
production involves neutrinos that result in large miss- 
ing transverse energy (^? T ) in the final state. In con- 
trast, the Ifjrp in background events is typically low, and 
mostly resulting from jet energy mismeasurements. The 
Ifirp distribution in the jet-enriched sample is shown in 
Figure ^h) ■ The Jf, T is reconstructed as the negative of 
the vectorial sum of the px of calorimeter cells, and is 
corrected for the pt of identified muons and the e nerg y 
corrections to reconstructed jets in the calorimeter [431 ] . 

Finally, the Onn distributions for the leading photon 
(O^n) ana - the trailing photon (O-^) show discrimina- 
tion between signal and the jj and jj backgrounds, in 
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FIG. 5: (color online). Distribution of (a) p^ 1 in the photon- 
enriched sample and (b) p^? in the jet-enriched sample. The 
data (points with statistical error bars) are compared to the 
background prediction, broken down into its individual com- 
ponents. The expected distributions for a SM Higgs boson 
and a fermiophobic Higgs boson with Mh = 125 GeV are also 
shown scaled by a factor of 1000. These two BDT input vari- 
ables are used in both the photon-enriched and jet-enriched 
samples, but are displayed here for only one of the samples 
for illustrative purposes. 



FIG. 6: (color online). Distribution of (a) p^f in the photon- 
enriched sample and (b) A<^ 77 in the jet-enriched sample. 
The data (points with statistical error bars) are compared 
to the background prediction, broken down into its individ- 
ual components. The expected distributions for a SM Higgs 
boson and a fermiophobic Higgs boson with Mb = 125 GeV 
are also shown scaled by a factor of 1000. These two BDT 
input variables are used in both the photon-enriched and jet- 
enriched samples, but are displayed here for only one of the 
samples for illustrative purposes. 



particular in the jet-enriched sample, as illustrated in 
Fig. [8] The observed discrepancies between the data and 
the total prediction in the shape of the distribution are 
partly covered by the combination of statistical uncer- 
tainties on the templates and the systematic uncertain- 
ties, and they have been checked to have a negligible 
impact on the final result. 

To improve the sensitivity of the search, a boosted- 
decision-tree (BDT) technique [44] is used to build a 
single discriminating variable combining the information 
from the ten variables. A different BDT is trained, for 
each Mh hypothesis, for events selected in the search 



region, corresponding to M 77 falling in the interval of 
Mh ± 30 GeV. The training is performed separately 
for the SM and the fermiophobic Higgs bosons models, 
considering in each case the sum of all relevant signals 
against the sum of all backgrounds. A separate BDT 
is trained in the photon-enriched and jet-enriched sam- 
ples, respectively. The resulting BDT output distribu- 
tions assuming a SM and a fermiophobic Higgs boson 
with M H = 125 GeV are shown in Figs. U and [TUl re- 
spectively. Prior to fitting the background yields to the 
data, these distributions are well modeled by the sim- 
ulation and no significant excess above the background 
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FIG. 7: (color online). Distribution of (a) <j>* in the photon- 
enriched sample and (b) ]p T in the jet-enriched sample. The 
data (points with statistical error bars) are compared to the 
background prediction, broken down into its individual com- 
ponents. The expected distributions for a SM Higgs boson 
and a fermiophobic Higgs boson with Mh = 125 GeV are also 
shown scaled by a factor of 1000. These two BDT input vari- 
ables are used in both the photon-enriched and jet-enriched 
samples, but are displayed here for only one of the samples 
for illustrative purposes. 



FIG. 8: (color online). Distribution of (a) 0^ N and (b) 0^ N 
in the jet-enriched sample. The data (points with statisti- 
cal error bars) are compared to the background prediction, 
broken down into its individual components. The expected 
distributions for a SM Higgs boson and a fermiophobic Higgs 
boson with Mh = 125 GeV are also shown scaled by a fac- 
tor of 1000. These two BDT input variables are used as well 
in the photon-enriched sample, although their discrimination 
power is limited given the Onn > 0.75 requirement applied 
to both photons. 



prediction is observed at high values of the BDT output. 



VII. SYSTEMATIC UNCERTAINTIES 

Systematic uncertainties affecting the normalization 
and shape of the BDT output distributions are estimated 
for both signal and backgrounds, taking into account cor- 
relations. Experimental uncertainties affecting the nor- 
malization of the signal and the Z/j* —> e + e~ back- 
ground include the integrated luminosity (6.1%), track- 
ing system live-time correction (2.0%), trigger efficiency 



(0.1%), PV reconstruction efficiency (0.2%), and pho- 
ton identification efficiency for signal (3.9%) or electron 
misidentification rate for Z/j* —> e + e~ (12.7%). The im- 
pact from PDF uncertainties on the signal acceptance is 
1.7%-2.2% depending on Mh- Additional sources of un- 
certainty affecting the normalization result from uncer- 
tainties on the theoretical cross section (including varia- 
tions of the renormalization and factorization scales (45j 
and the PDFs [H) for signal (GF (14.1%), VH (6.2%) 
and VBF (4.9%')) and Z/ 7 * e+e" (3.9%) production. 

The normalization uncertainties affecting the jj and 
jj background predictions result from propagating the 
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FIG. 9: (color online). Distribution of the BDT output used 
in the SM Higgs boson search in (a) the photon-enriched sam- 
ple and (b) the jet-enriched sample. The data (points with 
statistical error bars) are compared to the background pre- 
diction, broken down into its individual components. The 
expected distributions for a SM Higgs boson with Mh = 
125 GeV are also shown scaled by a factor of 10. 



uncertainties on e 7 (1.5%) and ej (10%) in the estimation 
of their yields via Eq. [T] The uncertainties on the jj and 
jj yields from varying e 7 are 6.9% and 5.3%, respectively. 
The corresponding uncertainties from varying €j are 0.6% 
and 15.3%, respectively. 

The remaining systematic uncertainties affect the 
shape of the BDT output distributions. Such uncertain- 
ties include the photon energy scale (l%-5% for signal, 
l%-4% for DPP background), the modeling of DPP by 
SHERPA (1%-10%), and the modeling of the Higgs boson 
Pt spectrum in GF production (l%-5%). The last two 
uncertainties are obtained by doubling and halving the 
factorization and renormalization scales with respect to 
the nominal choice. Uncertainties on the shape of the 
13 + jj background are 5%-7% and are estimated by 



FIG. 10: (color online). Distribution of BDT output used 
in the fermiophobic Higgs boson search in (a) the photon- 
enriched sample and (b) the jet-enriched sample. The data 
(points with statistical error bars) are compared to the back- 
ground prediction, broken down into its individual compo- 
nents. The expected distributions for a fermiophobic Higgs 
boson with Mu t = 125 GeV are also shown scaled by a factor 
of 10. 

comparing the BDT output distribution from the high- 
statistics samples obtained by inverting the Onn require- 
ment to those predicted via the matrix method. 



VIII. RESULTS 

For each hypothesized Mh value, the BDT output dis- 
tributions discussed in Sect. [VI] for the photon-enriched 
and jet-enriched samples are used to perform the statis- 
tical analysis to search for a significant signal above the 
background prediction. As mentioned before, such dis- 
criminants are defined only for events with M 77 falling 
in the Mh ± 30 GeV interval. The remainder of the M 77 
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spectrum (see Fig. [4| for both the photon-enriched and 
jet-enriched samples, corresponding to the sideband re- 
gions, is also included in the statistical analysis as it pro- 
vides a significant constraint on the DPP normalization. 
Therefore, for each Mb a total of four distributions are 
analyzed. 

In the absence of a significant data excess above 
the background prediction, upper limits on the product 
of the production cross section and branching fraction 
(<t x B(H — > 77)) are derived as a function of M H , for 
both the SM and fermiophobic Higgs boson scenarios. 
Limits are calculated at the 95% CL with the modified 
frequentist approach [47j , which employs a log-likelihood 
ratio (LLR) as test-statistic, LLR = — 2 ln(L s+ b/Lb), 
where L s +b (-^b) is a binned likelihood function (prod- 
uct of Poisson probabilities) to observe the data under 
the signal-plus-background (background-only) hypothe- 
sis. Pseudo-experiments are generated for both hypothe- 
ses, taking into account per-bin statistical fluctuations 
of the total predictions according to Poisson statistics, 
as well as Gaussian fluctuations describing the effect of 
systematic uncertainties. The individual likelihoods are 
maximized with respect to the DPP background normal- 
ization as well as other nuisance parameters that param- 
eterize the systematic uncertainties [48] . This global fit 
determines the normalization of the DPP background di- 
rectly from data and significantly reduces the impact of 
systematic uncertainties on the overall sensitivity. Exam- 
ples of the post-fit BDT output distribution, after back- 
ground subtraction, are shown in Fig. 1111 The fraction 
of pseudo-experiments for the signal-plus-background 
(background-only) hypothesis with LLR larger than a 
given threshold defines CL s+ b (CLb). This threshold 
is set to the observed (median) LLR for the observed 
(expected) limit. Signal cross sections for which CL S = 
CL s+ b/CLb < 0.05 are deemed to be excluded at 95% 
CL. 

The resulting upper limits on a x B(H — > 77) relative 
to the SM prediction are shown as a function of Mh in 
Fig. ll2f aL and are summarized in Table Hill representing 
the most constraining results for a SM Higgs boson decay- 
ing into diphotons at the Tevatron. The corresponding 
LLR distribution is shown in Fig.[T27b). The observed lo- 



cal excesses of data are under 2 s.d. and therefore are con- 
sistent with background fluctuations. At Mh = 125 GeV 
the best-fit signal cross section is a factor of 4.2 ± 4.6 
above the SM prediction. At the same mass, the value of 
CL s+ b is 0.72 while the p- value for the background-only 
hypothesis is 1 — CLb = 0.20. 

Upper limits on a x B(H — > 77) relative to the fermio- 
phobic Higgs model prediction are shown as a function of 
Mji f in Fig. lTBT a). and are summarized in Table ITVl This 
translates into the observed (expected) lower 95% CL of 
M Hi > 113 (114) GeV. After dividing by the theoretical 
cross section, upper limits on B(H[ — > 77) are derived as 
a function of Mh s and presented in Fig. [TBTbL 
IX. SUMMARY 

A search for a Higgs boson decaying into a pair of pho- 
tons has been presented using 9.6 fb _1 of pp collisions at 
y/s = 1.96 TeV collected with the DO detector at the 
Fermilab Tevatron Collider. The search employs mul- 
tivariate techniques to discriminate the signal from the 
non-resonant background, and is separately optimized for 
a SM and a fermiophobic Higgs boson. No significant ex- 
cess of data above the background prediction is observed, 
and upper limits on the product of the cross section and 
branching fraction are derived at the 95% CL as a func- 
tion of M H . For a SM Higgs boson with M H = 125 GeV, 
the observed (expected) upper limits are a factor of 12.8 
(8.7) above the SM prediction. The existence of a fermio- 
phobic Higgs boson with mass in the 100-113 GeV range 
is excluded at the 95% confidence level. 
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FIG. 11: (color online). Distribution of the BDT output for data (points with statistical error bars) after subtraction of the 
fitted background (under the background-only hypothesis) in (a) the photon-enriched sample and (b) the jet-enriched sample, 
for Mh = 125 GeV. The expected SM Higgs signal is normalized to the observed limit on a x B(H — > 77). The bands represent 
the 1 s.d. uncertainties on the background prediction resulting from the fit. 
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FIG. 12: (color online), (a) Observed and expected 95% CL limits on the ratio of a x B(H — > 77) to the SM prediction 
as a function of Mh- The bands correspond to 1 and 2 s.d. around the median expected limit under the background-only 
hypothesis, (b) Observed log-likelihood ratio (LLR) as a function of Mh compared to the expected LLR under the background- 
only hypothesis (LLRb) and signal+background hypothesis (LLR s+ b). The bands correspond to the 1 s.d. and 2 s.d. around 
the expected median LLRb. 
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TABLE III: Expected and observed upper limits at 95% CL on the cross section times branching fraction for H — > 77 (a x 
B{H — > 77)) and on a x B(H — > 77) relative to the SM prediction for a SM Higgs boson as a function of Mh- 
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FIG. 13: (color online), (a) Observed and expected 95% CL limits on the ratio of a x B(H — > 77) to the fermiophobic Higgs 
model prediction as a function of Mn f - The bands correspond to 1 and 2 s.d. around the median expected limit under the 
background-only hypothesis, (b) Observed and expected 95% CL limits on B(Hf — > 77) as a function of Mn t - The bands 
correspond to the 1 and 2 s.d. around the median expected limit under the background-only hypothesis. Also shown is the 
prediction for a fermiophobic Higgs boson. 
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TABLE IV: Expected and observed upper limits at 95% CL on the cross section times branching fraction for Hi — > 77 
(a x B(H{ — > 77)) and on B(Hi — > 77) for a fermiophobic Higgs boson as a function of Mh s . Also given are the theoretical 
predictions for a x B(Hf — > 77) and B(Hf — > 77) as a function of Mh s - 
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